This aggregated cheatsheet of our tutorial package Introduction to programming with R summarizes the individual notes from the provided tutorials and additional information from recommended sources.
== operator tests for equality of things; != for inequality= is used for assignment of values to arguments (don’t use for object assignment!)TRUE/FALSE values
& (AND), | (OR), and ! (NOT)() after their namec() combines/concatenates multiple values into one vector objectinstall.packages() installs a new package via its name from the console (also possible via ‘Packages’ menu in RStudio)ggplot2 for visualizationdplyr for data transformationtidyverselibrary() loads a package via its name
help() provides details information on functions, data sets, packages
? short versionhead() first elements of objectglimpse() (from dplyr package) compact variable view of tabular data
+, <-, =, …), ], }, or closing quote, i.e. ", 'ggplot2 is the visualization package
ggplot() defines what data to plot (and general/shared aesthetics)
data = first argument = what tidy data table to visualizegeom_...() defines how to plot the data, i.e. the plot geometry, e.g.
geom_point() (Summary page)geom_line() (Summary page)geom_smooth() (Summary page)+ combines different geometries, labels, etc. into one plot
ggplot(mtcars, mapping=aes(x=wt, y=mpg)) + geom_point() + xlab("weight")mapping argument takes aes() aesthetics, which defines what data variables from the input table are used where in the plot, i.e. their mapping (Summary page)
ggplot(): applied to all subsequent geom_*() functions
ggplot(mtcars, mapping=aes(x=wt, y=mpg)) + geom_point()geom_point(): only used when doing this point drawing
ggplot(mtcars) + geom_point(mapping=aes(x=wt, y=mpg)). and _myname is not myName<- for object assignment (RStudio hot key = ‘ALT’ + ‘-’)seq() generates a vector of subsequent numbers, e.g. seq(2,5) is equal to c(2,3,4,5)'' or "", both worksx is equal to print(x)
() triggers printing implicitlytibble = data structure to store tabular data (tidyverse extension of data.frame structure)
p2 <- tibble( x = 0:3, y = c(1,2,4,8)) (as done for data.frame or list)
p2 <- tibble( x=0:3, y=2^x)p2 <- tribble( ~x, ~y, 0, 1, 1, 2, 2, 4, 3, 8) (put line breaks where appropriate)$colName = dollar + name-based (most recommended), e.g. p2$y[["colName"]] = double-squared + quoted name as done for lists, e.g. p2[["y"]][[2]] = double-squared + index-based (try to avoid), e.g. p2[[2]]
[] = single-squared + index, not a column vector!table[["strange name"]] - quotes in double-bracket usagetable$`strange name` - back-ticks in direct name-based accessglimpse(p2) to check column names and resp. data typesas_tibble() converts an object into a tibble (if possible)is_tibble() checks whether it is a tibblestr()provides the data structure of an R object, i.e. how it is organized and what’s insidec() (concatenate)TRUE or FALSE)
==, !=, >, <, <=, >=&, |, xor()!%in% vector (or counterpart %notin%)between() (dplyr)near() better for floating point number comparison than ==filter() prunes table to rows of interest (Summary page)
TRUE pass the filter and are kept; all others are removed&%>% piping for connection of data transformation steps
summary() statistical overview of values in data structure; e.g. for each variable of a data framearrange() changes the order of rows (Summary page)
desc() around a variable triggers descending sorting w.r.t. the variable’s valuesNA is always lastselect() reduces the columns to variables of interest (Summary page)
c())
: generates a vector of ascendingly increasing elements (from:to) including the boundarieseverything() : all variables not named so far (useful for reordering of columns)! or - negation removes the specified variables (also works for vectors via c() or :)starts_with(), ends_with(), contains() : parts of variable namesmatches() : regular expression on names (discussed later)num_range() : combines strings and number vectors to full names like x1, x2, …any_of() : listed variable names not necessarily present in data table (useful for negative selection without errors/warnings)mutate() creates new variables (columns) (Summary page)
NAME = EXPRESSIONmutate() calls
transmute() creates a new table with new variables (dropping the input table)
A-B, respective A and B values from each row from both columns are subtracted (creating a new vector with the same length as both columns; one result for each row in the same row-order)+, -, *, /, ^ (power), log..()%/% and %%<, ==, !=, >=, …TRUE/FALSE): &, |, !cum...() = rolling …, e.g.
cumsum()/cumprod = rolling sum/product of consecutive values (= vector of values)cummin()/cummax = min/max of all rows up to herecummean()min_rank() or dense_rank() (typically one of both is what you are looking for)percent_rank(), row_number(), …n rows
lag() = previous values, i.e. shifting the column down (NAs at the beginning)lead() = following values, i.e. shiting the column up (NAs at the end)sum() = sum of all values in the variable column (= single value)mean()c(1,2,3,4)+1 and c(1,2,3,4)+c(1,2) are working while c(1,2,3)+c(1,2) is notsummarize() produces only a single output row (per group) (Summary page)
n()group_by() decomposes the observations (Summary page)
group_by() calls overwrite the grouping of the previous!ungroup() explicitly undoes the previous group_by() and merges the subtables back into onemean(), median(), quantile(), Sd()min(), max(), first(), last(), nth()n(), n_distinct()sum()
sum() of logical values == number of TRUEs (since scored 1, where FALSE is scored 0)mutate(), filter(), …NA handling important (set na.rm=TRUE argument if wanted)count() is a summary based on counting or summation onlyrename() allows to alter variable names (Summary page)
<NEWNAME> = <OLDVAR> where OLDVAR can be
colnames() give you the vector of a table’s column names (base R)slice functions extract specific rows (Summary page)
slice() - columns specified by indicesslice_min(), slice_max() - best/worst rows w.r.t. values in a given variableslice_head(), slice_tail() - top/tail rows w.r.t. current row orderslice_sample() - pick at randomn=.. argument for number of rowsprop=.. argument to specify a fraction of rows to keepdistinct() reduces the table to unique observations (Summary page)
.keep_all=TRUE no columns are removed from the outputjoin functions allow to fuse information from multiple tables into one
by = X - same column name X in both tablesby = c(X = Y) - merge based on X column in first table and Y variable from second tableNA entriesleft_join() keeps all rows from first tableinner_join() - only observations with values in both tablesfull_join() - all vs. all combinationsdplyr package cover even more verbs, see
readr package (Summary page)
read_delim() - columns are separated by a single letter given to delim argument
read_csv() - using , as column delimiterread_csv2() - columns delimited by ; and German ,-based decimal number notationread_tsv() - tabular as column delimiterread_fwf() - fixed width column specification (fixed number of letters per column)locale argument defines language specifications like decimal separator, names of days/months, date/time encoding, letter encodings, …
locale=locale(decimal_mark=",", grouping_mark=".") = German number notationlocale=locale(encoding="latin1") = using the Latin-1 letter encoding typically used on Microsoft Windows systems to enable a correct rendering of umlauts etc.quote='"' to specify quotation lettercol_types=list(colName="c") allows to specify the data type of (individual) columnsna='--' takes a vector of (string) values to be treated as not available (NA).gz, .bz2, .xz)R session defines from/to where to read/write files (enables relative path specification!)
getwd() and setwd()write functions
write_csv(DATATABLE, FILENAME)write_csv2() if you need a , decimal separator (no manual locale specification possible!)readxl for MS Excel import (Summary page)writexl for (simple) MS Excel export (Summary page)googlesheets4 for import/export of online Google Sheets (Summary page)tidyr package (Summary page)
pivot_longer() = column names go to “names” column (and values to one column)pivot_wider() = values from one column are distributed over multiple columnsNA handling
fill() replaces NA with previous/subsequent non-NA value in columncomplete() adds implicitly missing value combinationsdrop_na() removes rows with NAreplace_na() new value for NA entriesseparate() decomposes a variable’s text values into multiple columnsunite() joins multiple variables into one text columnStrings
' or double " quotes)\ of special characters like line-break \n, tabulator \t, …\uXXXX for UTF-16 and \U.. for UTF-32 encodings), e.g. "greek alpha = \u03B1"writeLines() : final text output in the console (e.g. for testing/checking)str_length() : number of characters/lettersstr_c() or paste(): concatenation of strings
sep stringcollapse stringstr_to_lower(), str_to_upper(), str_to_title() : capitalization conversion
locale for language specifics _ str_sort() : lexicographic sortinglocale specificstr_trim()/str_pad() : remove/add whitespaces at strings’ endsRegular expressions
. : any letter (but no newline per default)\\d (\\D) : (not a) digit\\s (\\S) : (not a) whitespace, i.e. space, tabulator, ..\\w : (english) word letter or digit[] : explicit letter list (use [^] for negation of list)^ : beginning of the text$ : end of the text\\b : end of a word (left or right)| alternative separator, e.g. a|b matches a or b() grouping to define blocks of a pattern, e.g. to specify alternatives? : 0 or 1 times* : 0 or multiple times+ : 1 or multiple times{} : explicit counts {3}, ranges {2,4} or “at least/most” {2,},{,3}str_view("aaaa","a+") reports the whole string as match and not only one letter!(ab)+ matches "ababab"str_view() shows pattern matches within the input strings (e.g. for regex testing)
match=T : show hits only (also useful to list not-matched strings)str_subset() provides the subset of input elements (a vector of strings) that match the given regex, e.g. str_subset(c("a","b","c"), "a|b")This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Feel free to contribute at https://github.com/Dr-Eberle-Zentrum/Introduction-to-programming-with-R.